ligDB - Online Query Processing Without (almost) any Storage
نویسندگان
چکیده
In the big-data era data is arriving at such a high pace and volume that data exploration and querying can only be feasible if data loading and indexing happens reasonably quick—if at all. Recent research on handling large scientific data suggests ignoring any database indexing or even data-loading processing steps but rather turns toward processing raw data as it is handed in by scientists, manually or by semi-automated means—if needed in multiple, iterative steps. In this paper, we describe the anatomy and research challenges of a system coined ligDB that is operating purely on incomplete database tables, JSON documents, or sets of SPO triplets that are being filled over time. There is no data stored per se; the only data stored is stemming from previously posed queries over the stream of arriving data; kept as long as it is used by forthcoming queries and otherwise evicted. A key point is that velocity dimension of “big data” allows queries being processed as they are posted, with higher-level queries processed on historic query results (views) and live data. Data that is not touched by any posted query is immediately discarded.
منابع مشابه
Olap Query Processing in Grids * * Work Partially Funded by Capes-cofecub (daad Project), Cnpq-inria (gridata Project), French Anr Massive Data (respire Project) and the European Strep Grid4all Project
OLAP query processing is critical for enterprise grids. Capitalizing on our experience with the ParGRES database cluster, we propose a middleware solution, GParGRES, which exploits database replication and interand intra-query parallelism to efficiently support OLAP queries in a grid. GParGRES has been partially implemented as database grid services on Grid5000. We give preliminary experimental...
متن کاملQuery Processing on Personal Computers: A Pragmatic Approach (Extended Abstract)
We present a query processing strategy for personal computers that requires at most a single sequential scan of the database for nearly all queries. On personal computers, most queries are ad-hoe, produce little output, and operate on small databases limited by secondary storage. For these queries we can use the relatively large amount of main memory to offset the slow secondary storage accesse...
متن کاملHyPer: Adapting Columnar Main-Memory Data Management for Transactional AND Query Processing
Traditionally, business applications have separated their data into an OLTP data store for high throughput transaction processing and a data warehouse for complex query processing. This separation bears severe maintenance and data consistency disadvantages. Two emerging hardware trends allow the consolidation of the two disparate workloads onto the same database state on one system: the increas...
متن کاملAdaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster
OLAP queries are typically heavy-weight and ad-hoc thus requiring high storage capacity and processing power. In this paper, we address this problem using a database cluster which we see as a cost-effective alternative to a tightly-coupled multiprocessor. We propose a solution to efficient OLAP query processing using a simple data parallel processing technique called adaptive virtual partitioni...
متن کاملMinimizing the MOLAP/ROLAP Divide: You Can Have Your Performance and Scale It Too
Over the past generation, data warehousing and online analytical processing (OLAP) applications have become the cornerstone of contemporary decision support environments. Typically, OLAP servers are implemented on top of either proprietary array-based storage engines (MOLAP) or as extensions to conventional relational DBMSs (ROLAP). While MOLAP systems do indeed provide impressive performance o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015